Secrecy, Criminal Justice, and Variable Importance
Abstract:
The US justice system often uses a combination of (biased) human decision makers and complicated black box proprietary algorithms for high stakes decisions that deeply affect individuals. All of this is still happening, despite the fact that for several years, we have known that interpretable machine learning models were just as accurate as any complicated machine learning methods for predicting criminal recidivism. It is much easier to debate the fairness of an interpretable model than a proprietary model. The most popular proprietary model, COMPAS, was accused by the ProPublica group of being racially biased in 2016, but their analysis was flawed and the true story is much more complicated; their analysis relies on a flawed definition of variable importance that was used to identify the race variable as being important.
In this talk, I will start by introducing a very general form of variable importance, called model class reliance. Model class reliance measures how important a variable is to any sufficiently accurate predictive model within a class. I will use this and other data-centered tools to provide our own investigation of whether COMPAS depends on race, and what else it depends on. Through this analysis, we find another problem with using complicated proprietary models, which is that they seem to be often miscomputed. An easy fix to all of this is to use interpretable (transparent) models instead of complicated or proprietary models in criminal justice.
Bio:
Cynthia Rudin is a professor of computer science, electrical and computer engineering, and statistical science at Duke University, and directs the Prediction Analysis Lab, whose main focus is in interpretable machine learning. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo, and a PhD from Princeton University. She is a three time winner of the INFORMSInnovative Applications in Analytics Award, was named as one of the “Top 40 Under 40” by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. She is past chair of both the INFORMS Data Mining Section and the Statistical Learning and Data Science section of the American Statistical Association. She has also served on committees for DARPA, the National Institute of Justice, and AAAI. She has served on three committees for the National Academies of Sciences, Engineering and Medicine, including the Committee on Applied and Theoretical Statistics, the Committee on Law and Justice, and the Committee on Analytic Research Foundations for the Next-Generation Electric Grid. She is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics. She will be the Thomas Langford Lecturer at Duke University during the 2019-2020 academic year.
How Humans Judge Machines
Abstract:
In recent years advances in big data and algorithms have given rise to a world in which it is finally possible to include algorithmic decision making in the decision pipelines of governments and businesses. In this presentation I will discuss dozens of experiments documenting differences on how people judge human and A.I. actions.
Bio:
César A. Hidalgo is a Chilean physicist, author, and entrepreneur. He currently holds multiple academic appointments. He is an ANITI Chair at the University of Toulouse, a Honorary Professorship at the University of Manchester, and a Visiting Professor at Harvard's School of Engineering and Applied Sciences. From 2010 to 2019, Hidalgo lead MIT’s Collective Learning group. Prior to that, he was a research fellow and an adjunct faculty at the Harvard Kennedy School. Hidalgo is also a founder of Datawheel, an award winning company specialized in the creation of data distribution solutions. Hidalgo holds a PhD in Physics from the University of Notre Dame and a Bachelor in Physics from Universidad Católica de Chile. Professor Hidalgo is a multidisciplinary scholar working in Complexity, Data Science, Data Visualization, Economic Geography, and Artificial Intelligence. Hidalgo’s contributions have been recognized with numerous awards, including the 2018 Lagrange Prize, three Webby Awards (DataUSA, DataAfrica, & Streetchange), an Information is Beautiful Award (DataUSA), and the Bicentennial Medal of the Chilean Congress, among others. Hidalgo's is the author of Why Information Grows (Basic Books, 2015), which has been translated to over ten languages, and an author of The Atlas of Economic Complexity (MIT Press, 2014). Hidalgo is ranked in the top 5% of all economic authors in Ideas REPEC in eight categories, including citations discounted by age and abstract views. Hidalgo is also a founder of Datawheel LLC, a company that has professionalized the creation of large data visualization engines.
Particle Physics in the context of Data Science
Abstract:
Particle physics is a field awash in data, equipped with a powerful theoretical foundation, and encumbered with the realities of large international scientific collaborations. The field has coped with very large datasets and taken advantage of machine learning techniques for decades. It also provides a unique perspective on issues around open data, reproducibility, multiple hypothesis testing, and meta-analysis. I will give an overview of the lessons I’ve learned when considering particle physics in the broader context of data science both in terms of methodology and as a training ground for future data scientists.
Bio:
Kyle Cranmer is a Professor of Physics and Data Science at New York University. He obtained his Ph.D. in Physics from the University of Wisconsin-Madison in 2005 and his B.A. in Mathematics and Physics from Rice University. His early research focused primarily on experimental particle physics with CERN's Large Hadron Collider in Geneva, Switzerland. He was awarded the Presidential Early Career Award for Science and Engineering in 2007 and the National Science Foundation's CAREER Award in 2009. Professor Cranmer developed a framework that enables collaborative statistical modeling, which was used extensively for the discovery of the Higgs boson in July, 2012. His current interests are at the intersection of physics, statistics, and machine learning and incorporate themes of scalability, reproducibility, and interpretability. Currently, he is the Executive Director for the Moore-Sloan Data Science Environment at NYU.